Querying Linguistic Annotations

نویسندگان

Sumukh Ghodke

Steven Bird

چکیده

Over the past decade, a variety of expressive linguistic query languages have been developed. The most scalable of these have been implemented on top of an existing database engine. However, with the arrival of efficient, wide-coverage parsers, it is feasible to parse text on a scale that is several orders of magnitude larger. We show that the existing database approach will not scale up, and speculate on a new approach that leverages proximity search in the context of an IR engine. We also propose a simple syntax for querying linguistic annotations, avoiding the usability problems with existing tree query languages.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Indexing and Querying Linguistic Metadata and Document Content

The need for efficient corpus indexing and querying arises frequently both in machine learning-based and human-engineered natural language processing systems. This paper presents the ANNIC system, which can index documents not only by content, but also by their linguististic annotations and features. It also enables users to formulate versatile queries mixing keywords and linguistic information...

متن کامل

Knowledge-based Multimodal Data Representation and Querying

This paper focuses on the representation and querying of knowledge-based multimodal data. Our work stands in the multidisciplinary project OTIM (Tools for Multimodal Annotation) dedicated to the development of tools for multimodal annotation of french conversational data. OTIM aims at encoding and manipulating annotations from all the linguistic domains in an unique framework. Defining a data m...

متن کامل

Querying Linguistic Trees

Large databases of linguistic annotations are used for testing linguistic hypotheses and for training language processing models. These linguistic annotations are often syntactic or prosodic in nature, and have a hierarchical structure. Query languages are used to select particular structures of interest, or to project out large slices of a corpus for external analysis. Existing languages suffe...

متن کامل

Identifying complex phenomena in a corpus via a treebank lens

While syntactically annotated corpora known as treebanks have been available for many years, along with a variety of customized tools for querying these annotations, the mapping from actual annotations to relevant syntactic or semantic phenomena has been obscured by the coarse-grained labelling of nodes in the parse trees which make up the treebanks. This lack of linguistic detail has hampered ...

متن کامل

Semantic Technologies for Querying Linguistic Annotations: An Experiment Focusing on Graph-Structured Data

With growing interest in the creation and search of linguistic annotations that form general graphs (in contrast to formally simpler, rooted trees), there also is an increased need for infrastructures that support the exploration of such representations, for example logical-form meaning representations or semantic dependency graphs. In this work, we heavily lean on semantic technologies and in ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2008

Querying Linguistic Annotations

نویسندگان

چکیده

منابع مشابه

Indexing and Querying Linguistic Metadata and Document Content

Knowledge-based Multimodal Data Representation and Querying

Querying Linguistic Trees

Identifying complex phenomena in a corpus via a treebank lens

Semantic Technologies for Querying Linguistic Annotations: An Experiment Focusing on Graph-Structured Data

عنوان ژورنال:

اشتراک گذاری